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Abstract Neural networks have been applied to tasks in 
several areas of artificial intelligence, including vision, 
speech, and language. Relatively little work has been done 
in the area of problem solving. Two approaches to 
path-finding are presented, both using neural network 
techniques. Both techniques require a training period. 
Training under the back propagation (BPL) method was 
accomplished by presenting representations of [current 
position, goal position] pairs as input and appropriate 
actions as output. The Hebbian/interactive activation 
(HIA) method uses the Hebbian rule to associate points 
that are nearby. A path to a goal is found by activating a 
representation of the goal in the network and processing 
until the current position is activated above some threshold 
level. BPL, using back-propagation learning, failed to 
learn, except in a very trivial fashion, that is equivalent to 
table lookup techniques. HIA, performed much better, 
and required storage of fewer weights. In drawing a 
comparison, it is important to note that back propagation 
techniques depend critically upon the forms of 
representation used, and can be sensitive to parameters in 
the simulations; hence the BPL technique may yet yield 
strong results. 

Introduction 

Description of the problem. 

A map is given, which is a represention of landmarks and 
allowed paths and/or obstacles in the relevant region of 
space. Given an arbitrary pair of points, I (initial) and G 
(goal), the problem is to compute a sequence of actions 
which will bring the subject from I to G . Several 


approaches to this classic problem have been put forward 
(see for example, Brooks, 1983). These tend to rely upon 
explicit geometrical computations on polygonal 
representations of obstacles. In contrast, any geometrical 
considerations in the neural network approaches described 
below are implicit that is, they are emergent artifacts of the 
learning processes. Two principles for processing and 
training of neural networks are briefly described in this 
section. More detailed treatments can be found in the 
references. 

Back-propagation learning 

Back propagation (Rumelhart, Hinton, & Williams, 1986) 
is a general algorithmic framework for training a 
feed-forward network of semi-linear units by randomly 
selecting pairs of input-output patterns from a training set 
and incrementally adjusting the network parameters, such 
that the network produces the appropriate output for a 
given input. The parameters of the network are usually, 
but not necessarily, restricted to the weights on the links 
(edges, in graph theoretic terminology) between the units 
(nodes). Initially, the parameters are set to random values. 
With each presentation of an input-output pair, the network 
produces a response to the input, which is compared to the 
desired output; the back-propagation learning (BPL) 
algorithm specifies a method for adjusting the network 
parameters, such that the discrepancy between the 
response and the desired output is reduced. The procedure 
is based on a gradient descent of the parmeter vector across 
an error measure. Like other gradient descent techniques, 
BPL is not guaranteed to find the global minimum; instead, 
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it often gets stuck in lopal minima, which may nevertheless 
result in acceptable performance by the network. As 
originally conceived, BPL was limited to static patterns; 
however there has been recent progress in processing 
time-vaiying inputs, (e.g. Jordan, 1987; Elman, 1988). 

Hebbianllnteracive Activation 

An interactive activation network (McClelland & 
R ume 1 hart, 1981) consists of a population of " neuron- like" 
elements, each representing an identifiable concept, in 
most implementations. The nodes are connected with 
positive weights, if their concepts arc positively 
associated, and with negative weights if they are negatively 
associated. Normally, the weights are "hard-wired"; that 
is, the weights are preset and do not modify. However 
Hebb's postulate (1949) can be realized as a differential 
equation for learning in such networks, as has been done 
in other models, such as the Brain State in the Box (BSB) 
model of Anderson (1977). 

A BPL approach to navigation: 

Method and results 

The back propagation algorithm is typically applied to 
categorization problems, by learning an input-output 
mapping, where the inputs are exemplars and the outputs 
are categories. Jordan (1987) showed^iow a network 
could be trained to learn sequences, by partitioning the 
input into a represention labelling the sequence and a 
representation of one element of the sequence, and the 
output as a representation of the successor element in the 
sequence. Below, a similar scheme is applied to the 
navigation problem. The input is partitioned into a 
representation of the current state and a representation of 
the desired (goal) state. The output drives some sort of 
effector which changes the current state. The network 
architecture is shown in Figure 1. The current state and 


goal states are represented as patterns of activation across 
sets of input units. Each of the input units is connected to 
each of the "hidden" units (hidden, because they do not 
interact with the environment external to the network) in 
the next layer. Each of the hidden units is connected to 
each of the output units. The connection matrices are 
symbolized by the bold arrows in Figure 1. 
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Figure 1 , Basic route-finding network. Input units 
specify current and goal locations. The network 
generates an appropriate action as the output. The 
current location is updated according to the interaction 
of the action with the physics of the environment. 


Attempts were made to train such a network on a very 
simple environment, consisting of a 5 by 5 grid of cells, 
each acccssable by a single step from its 4 (N, E, S, and 
W) neighbors. With a small set of training data, the 
network was able to learn the steps in that set perfectly. 
However, if the set became too large, performance would 
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suffer. The performance of the network (with the addition 
of a second layer of hidden units) is shown in Figure 2. 
For each of the 25 possible goals, a five by five matrix of 
arrows depicts the motion taken by the network . A circle 
indicates where the position is identical to the goal; thus for 
each matrix the circle is the target. Note that while the 
trend is generally correct, the network makes errors that 
lead to dead ends (edges) or limitless oscillations (for 
example, when two arrows point toward each other). 
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Figure 2. The results from a fully connected BPL 
network. See text for description. 


It was generally found that learning was much better when 
the patterns were presented to the network independently 
of the previous pattern. In the initial investigations, a 
particular goal was held constant while the network was 
trained on a sequence of steps leading to the goal, after 
which the goal was shifted to a random location. 


However, under such a training schedule, the goal position 
can remain constant too long, such that the weights from 
the current position representation ’’forget" what they 
learned with respect to other goals. 
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Figure 3. In this network, two sets of units 
have been inserted between the input layer and the 
hidden unit layer. One has a set of one-to-one 
connections with the GOAL input units and is 
fully connected to the CURRENT input units, 
and the other is connected in a complementary 
fashion. This "structured” hidden layer facilitates 
learning, but leads to poor generalization. 


To remedy this, a more complex architecture was 
introduced (see Figure 3), by inserting another layer of 
hidden units into the previous structure between the input 
layer and the hidden layer. This new hidden layer consists 
of two sets of units.. One set has one unit corresponding to 
each unit in the goal location input layer and receives input 
from that unit alone among the goal location input units; all 
of the units in that set receive input from all units in the 
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current location input set. That is, that set in the first 
hidden layer receives one-to-one connections (thin arrow) 
from the goal location input set and is fully connected to 
the current location input set (thick arrow). The other 
hidden set has a complementary set of connections. The 
one-to-one connections were gates, or multiplicative 
connections ; that is, unless input was received from one of 
these connections the hidden unit did not respond. 

With this architecture, the network was able to learn the 5 
by 5 environment perfectly, as shown in Figure 4. 
However, in this case learning is quite brittle. The 
network is now nothing more than a lookup table, since it 
has specific weights corresponding to every input 
combination. Thus, there is no generalization of 
information from one learning trial to any other situation. 
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Figure 4. The results from a partially connected BPL 
network. See text for description. 


A Hebbian/Interactive activation approach 
to navigation: 

Method and results 

In this network, the environment was represented similarly 
to the above input representation, in that there is a unit for 
every landmark in the environment. Again, in 
consideration of designing the computer simulation, a 
rectangular grid was used. However, the architecture was 
quite different. In this network, all units were connected 
(initially) to all other units. Simulations using this model 
were performed in more complex environments; here, not 
every grid element was connected to its four neighbors. 
Instead environments, such as that shown in Figure 5, 
were used. 

X — X — X X — X — X 

I II I 

X X — X X X — X 

I I I I I 

X — X X — X — X ~ X 

I 

X — X — X — X — X — X 

I I I 

X X — X X — X X 

I II I 

X — X — X — X — X — X 

Figure 5. An example of a maze environment, such as 
was used in the HIA simulations. 

Training was accomplished by an exploratory "wandering” 
process through the maze. Each cycle of the simulation 
began with taking a random step from the current position 
to a neighbor (neighbors in this case were defined by the 
links in the maze), with no backtracking unless necessary. 
Upon arrival at a node, the corresponding unit in the 
activation network was activated. Activatation in each unit 
would decay by a fraction a, with each cycle of the 
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simulation. The weights bewteen units would increase in 
proportion to the product of the activity in the two units, 
and decay by another factor p. Appropriate choice of a 
and p led to a situation in which the weights between 
adjacent units were much stronger than the weights 
bewteen units two or more steps removed. Hence, these 
were all set to zero, and the neural net became isomorphic 
to the maze. This network was used to compute paths 
between arbitrary points in the maze by the following three 
stage procedure: 

[1] The unit, g, corresponding to the goal is stimulated 
continuously at a high level, K, and activation spreads 
through the network via repeated iteration of matrix 
multiplication and simultaneous exponential decay, until 
the unit, c, corresponding to the current position is 
activated to a criterion level 0: 

A ft) = X W jk A k (t) ' nA j (t) for 
k 

A g (t) = K until A c > 0 

[2] The resulting pattern of activation A is then multiplied, 
element by element, by the pattern of the squares of the 
weights connecting c to the other units, W c . 

[3] The current position is then updated, by moving to the 
unit with the greatest resulting product: 

c(t+l) = index j that gives a maximum for 


Steps [2] and [3] are repeated until the goal is reached. 

This method was found to work quite well over a set of 
different mazes, usually finding the shortest path. In cases 
where the shortest path was not found, the result was close 
to the optimum. 


Discussion 

While BPL was found to be inadequate for solving 
relatively simple problems, it should be recognized that it 
frequently requires considerable time and effort (and 
educated guesswork) to apply it successfully to a particular 
problem. The pattern representations must be carefully 
considered. Also, the network architecture and even such 
parameters as the learning rate and the momemtum (see 
Rumelhart, Hinton, and Williams, 1986 for a detailed 
description) can be critical in determining the success of a 
particular simulation. Thus, while the results reported 
above are discouraging, it is too soon to dismiss this 
approach. 

The second technique, HIA, performed much better. The 
algorithm for finding a path is not especially novel; it is 
essentially equivalent to searching though a graph for the 
shortest path. The novelty is in the Hebbian modification 
technique used to construct the graph via temporally 
correlated activitations. This is somewhat sensitive to the 
parameters a and p. Further work is required for a general 
solution using this approach. 

Other future plans include using more sophisticated 
representations for location, using multiple maps of the 
environment, such as maps for various types of 
transportation (e.g., walking vs. driving), or maps 
covering various scales (e.g., city maps vs. world maps). 
Recent work (Munro & Hirtle, 1989) has shown how the 
interactive activation model can account for a variety of 
documented psychological data, which indicates 
interactions between internal representations of different 
maps in free recall of geographical information. 
Conceivably, a hybrid technique, involving both the BPL 
and HIA methods will be used. Such a combination 
would probably use HIA for the high level planning and 
BPL to issue the action commands to the drive mechanism. 
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